This report analyzes the partisan lean of visitors to ~328,000 points of interest (POIs) in Ohio during 2023. Partisan lean is calculated by matching visitor home Census Block Groups (CBGs) to 2020 presidential election results, weighted by visitor counts.
Key Metric: rep_lean = Two-party Republican vote share (Trump / (Trump + Biden)) weighted by visitor origin CBGs.
| Comparison | Gap |
|---|---|
| Whole Foods (40%) vs Kroger (54%) | 14 pts |
| Target (48%) vs Walmart (58%) | 10 pts |
| Starbucks (49%) vs Dunkin’ (52%) | 3 pts |
The partisan lean measure captures where a business is located more than what kind of business it is. However, there IS meaningful variation across business types within the same neighborhood, suggesting some consumer sorting by partisanship exists beyond pure geographic sorting.
# Load the Ohio 2023 data
df <- read_parquet("/global/scratch/users/maxkagan/project_westwood/outputs/location_partisan_lean/ohio_2023.parquet")
# Convert month to date
df <- df %>%
mutate(
month_date = as.Date(month),
month_label = format(month_date, "%b %Y")
)
# Summary stats
n_records <- nrow(df)
n_pois <- n_distinct(df$placekey)
n_months <- n_distinct(df$month)
mean_lean <- mean(df$rep_lean, na.rm = TRUE)Data Overview:
ggplot(df, aes(x = rep_lean)) +
geom_histogram(aes(y = after_stat(density)), bins = 50, fill = "steelblue", alpha = 0.7) +
geom_density(color = "darkred", linewidth = 1) +
geom_vline(xintercept = 0.5, linetype = "dashed", color = "black", linewidth = 0.8) +
geom_vline(xintercept = mean_lean, linetype = "solid", color = "red", linewidth = 0.8) +
annotate("text", x = 0.52, y = 3, label = "Even (0.5)", hjust = 0, size = 3) +
annotate("text", x = mean_lean + 0.02, y = 2.5, label = paste0("Mean (", round(mean_lean, 3), ")"),
hjust = 0, size = 3, color = "red") +
labs(
title = "Distribution of Visitor Partisan Lean Across Ohio POIs",
subtitle = "2023 monthly observations",
x = "Republican Lean (Two-Party Vote Share)",
y = "Density"
) +
scale_x_continuous(labels = percent_format(accuracy = 1), limits = c(0, 1))summary_stats <- df %>%
summarize(
N = n(),
Mean = mean(rep_lean, na.rm = TRUE),
SD = sd(rep_lean, na.rm = TRUE),
Min = min(rep_lean, na.rm = TRUE),
Q1 = quantile(rep_lean, 0.25, na.rm = TRUE),
Median = median(rep_lean, na.rm = TRUE),
Q3 = quantile(rep_lean, 0.75, na.rm = TRUE),
Max = max(rep_lean, na.rm = TRUE)
)
summary_stats %>%
mutate(across(where(is.numeric), ~round(., 4))) %>%
kable(caption = "Summary Statistics for Republican Lean") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| N | Mean | SD | Min | Q1 | Median | Q3 | Max |
|---|---|---|---|---|---|---|---|
| 2946045 | 0.5238 | 0.1565 | 0.0061 | 0.4174 | 0.5297 | 0.6432 | 0.9527 |
Do high-traffic locations differ in partisan lean from low-traffic ones?
df <- df %>%
mutate(
visitor_quartile = cut(
total_visitors,
breaks = quantile(total_visitors, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE),
labels = c("Q1 (Lowest)", "Q2", "Q3", "Q4 (Highest)"),
include.lowest = TRUE
)
)
volume_summary <- df %>%
group_by(visitor_quartile) %>%
summarize(
N = n(),
Mean_Visitors = mean(total_visitors),
Mean_RepLean = mean(rep_lean, na.rm = TRUE),
SD_RepLean = sd(rep_lean, na.rm = TRUE),
.groups = "drop"
)
volume_summary %>%
mutate(
Mean_Visitors = round(Mean_Visitors, 0),
Mean_RepLean = round(Mean_RepLean, 4),
SD_RepLean = round(SD_RepLean, 4)
) %>%
kable(caption = "Partisan Lean by Visitor Volume Quartile") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| visitor_quartile | N | Mean_Visitors | Mean_RepLean | SD_RepLean |
|---|---|---|---|---|
| Q1 (Lowest) | 759092 | 22 | 0.5368 | 0.1800 |
| Q2 | 715606 | 71 | 0.5189 | 0.1574 |
| Q3 | 735718 | 160 | 0.5164 | 0.1470 |
| Q4 (Highest) | 735629 | 1012 | 0.5223 | 0.1368 |
ggplot(df, aes(x = visitor_quartile, y = rep_lean, fill = visitor_quartile)) +
geom_boxplot(alpha = 0.7) +
geom_hline(yintercept = 0.5, linetype = "dashed", color = "black") +
labs(
title = "Partisan Lean by Visitor Volume",
subtitle = "Higher-traffic locations tend to have slightly different visitor composition",
x = "Visitor Volume Quartile",
y = "Republican Lean"
) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
scale_fill_brewer(palette = "Blues") +
theme(legend.position = "none")# Filter to brands with sufficient data (at least 50 POI-months)
brand_summary <- df %>%
filter(!is.na(brands) & brands != "") %>%
group_by(brands) %>%
summarize(
n_poi_months = n(),
n_unique_pois = n_distinct(placekey),
total_visitors = sum(total_visitors, na.rm = TRUE),
mean_rep_lean = mean(rep_lean, na.rm = TRUE),
sd_rep_lean = sd(rep_lean, na.rm = TRUE),
.groups = "drop"
) %>%
filter(n_poi_months >= 50) %>%
arrange(desc(mean_rep_lean))
n_brands <- nrow(brand_summary)1122 brands have at least 50 POI-month observations in Ohio.
brand_summary %>%
slice_head(n = 25) %>%
mutate(
mean_rep_lean = round(mean_rep_lean, 3),
sd_rep_lean = round(sd_rep_lean, 3),
total_visitors = format(total_visitors, big.mark = ",")
) %>%
select(Brand = brands, `POI-Months` = n_poi_months, `Unique POIs` = n_unique_pois,
`Total Visitors` = total_visitors, `Mean Rep Lean` = mean_rep_lean,
`SD` = sd_rep_lean) %>%
kable(caption = "Top 25 Most Republican-Leaning Brands") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Brand | POI-Months | Unique POIs | Total Visitors | Mean Rep Lean | SD |
|---|---|---|---|---|---|
| Pak-A-Sak / Party Mart | 289 | 28 | 264,119 | 0.756 | 0.068 |
| Nutrien Ag Solutions | 171 | 27 | 4,453 | 0.748 | 0.073 |
| Factory Connection | 91 | 8 | 26,548 | 0.743 | 0.044 |
| Case IH | 260 | 26 | 29,709 | 0.735 | 0.069 |
| Village Pantry | 91 | 9 | 48,930 | 0.731 | 0.068 |
| Giovanni’s | 382 | 34 | 95,331 | 0.728 | 0.059 |
| Agland Co-op | 306 | 48 | 11,124 | 0.727 | 0.074 |
| Chief Markets | 118 | 11 | 110,840 | 0.726 | 0.060 |
| Parkview Health | 215 | 41 | 95,856 | 0.723 | 0.030 |
| Bealls Outlet | 133 | 13 | 46,521 | 0.723 | 0.054 |
| Riesbeck’s Food Markets | 156 | 13 | 78,533 | 0.722 | 0.042 |
| WesBanco | 121 | 11 | 35,063 | 0.720 | 0.068 |
| Heritage Cooperative​ | | 15 | | 3 | |2,288 | | 0.71 | | 0.10 |
| Casey’s General Stores | 320 | 27 | 257,902 | 0.716 | 0.091 |
| Community Markets | 203 | 17 | 87,390 | 0.714 | 0.056 |
| Chevrolet,GMC (General Motors Company),Cadillac,Buick | 101 | 10 | 36,706 | 0.713 | 0.057 |
| Suburban Propane | 320 | 44 | 31,435 | 0.713 | 0.124 |
| FoodFair Markets | 57 | 6 | 26,201 | 0.712 | 0.032 |
| Clark’s Pump-n-Shop | 217 | 19 | 311,479 | 0.702 | 0.042 |
| Fox’s Pizza Den | 117 | 13 | 24,918 | 0.702 | 0.106 |
| Coen | 59 | 5 | 15,985 | 0.701 | 0.039 |
| Shoe Sensation | 310 | 28 | 51,536 | 0.699 | 0.073 |
| Fruth Pharmacy | 119 | 10 | 19,063 | 0.697 | 0.096 |
| Lassus Handy Dandy | 58 | 5 | 54,016 | 0.697 | 0.051 |
| Rural King | 325 | 28 | 608,653 | 0.697 | 0.057 |
brand_summary %>%
slice_tail(n = 25) %>%
arrange(mean_rep_lean) %>%
mutate(
mean_rep_lean = round(mean_rep_lean, 3),
sd_rep_lean = round(sd_rep_lean, 3),
total_visitors = format(total_visitors, big.mark = ",")
) %>%
select(Brand = brands, `POI-Months` = n_poi_months, `Unique POIs` = n_unique_pois,
`Total Visitors` = total_visitors, `Mean Rep Lean` = mean_rep_lean,
`SD` = sd_rep_lean) %>%
kable(caption = "Top 25 Most Democratic-Leaning Brands") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Brand | POI-Months | Unique POIs | Total Visitors | Mean Rep Lean | SD |
|---|---|---|---|---|---|
| Dave’s Markets | 132 | 11 | 46,387 | 0.231 | 0.080 |
| Ashley Stewart | 56 | 6 | 3,074 | 0.249 | 0.137 |
| Oak Street Health | 119 | 12 | 25,437 | 0.278 | 0.139 |
| Brightside Academy | 139 | 13 | 7,389 | 0.282 | 0.123 |
| Happy’s Pizza | 99 | 10 | 4,964 | 0.285 | 0.122 |
| DTR/VILLA | 220 | 20 | 77,080 | 0.287 | 0.134 |
| ChenMed | 103 | 9 | 20,965 | 0.290 | 0.103 |
| Rascal House Pizza | 72 | 6 | 15,648 | 0.294 | 0.139 |
| City Gear | 151 | 13 | 14,879 | 0.302 | 0.123 |
| ACE Cash Express | 237 | 25 | 9,133 | 0.304 | 0.145 |
| Church’s Chicken | 88 | 8 | 16,779 | 0.305 | 0.101 |
| PLS Financial Services | 106 | 9 | 21,579 | 0.309 | 0.134 |
| Shoppers World | 59 | 5 | 15,186 | 0.309 | 0.107 |
| Rainbow Shops | 269 | 26 | 173,636 | 0.320 | 0.127 |
| Loya Insurance Group | 74 | 9 | 56,296 | 0.327 | 0.168 |
| Octapharma Plasma | 118 | 10 | 13,959 | 0.329 | 0.110 |
| Citi Trends | 308 | 30 | 67,663 | 0.330 | 0.136 |
| Brassica | 60 | 5 | 29,210 | 0.334 | 0.058 |
| Mr. Chicken | 72 | 6 | 7,401 | 0.346 | 0.108 |
| First Merchants Bank | 101 | 24 | 12,778 | 0.352 | 0.092 |
| Talecris Plasma Resources | 59 | 5 | 4,558 | 0.354 | 0.119 |
| FirstMerit Bank | 107 | 9 | 85,822 | 0.357 | 0.051 |
| Georgios Oven Fresh Pizza Co. | 194 | 18 | 18,776 | 0.365 | 0.171 |
| MetroHealth System | 292 | 25 | 82,617 | 0.369 | 0.132 |
| Waterway Carwash | 59 | 5 | 30,453 | 0.371 | 0.109 |
# Top 30 brands by POI count for visualization
top_brands <- brand_summary %>%
slice_max(n_poi_months, n = 30) %>%
pull(brands)
df %>%
filter(brands %in% top_brands) %>%
mutate(brands = factor(brands, levels = brand_summary %>%
filter(brands %in% top_brands) %>%
arrange(mean_rep_lean) %>%
pull(brands))) %>%
ggplot(aes(x = rep_lean, y = brands, fill = after_stat(x))) +
geom_density_ridges_gradient(scale = 2, rel_min_height = 0.01) +
scale_fill_gradient2(
low = "#2166AC", mid = "#F7F7F7", high = "#B2182B",
midpoint = 0.5, name = "Rep Lean"
) +
geom_vline(xintercept = 0.5, linetype = "dashed", alpha = 0.5) +
labs(
title = "Partisan Lean Distribution by Brand",
subtitle = "Top 30 brands by POI count, ordered by mean Republican lean",
x = "Republican Lean",
y = NULL
) +
scale_x_continuous(labels = percent_format(accuracy = 1), limits = c(0.2, 0.8)) +
theme(legend.position = "right")# Compare similar brands within categories
compare_brands <- c(
# Fast Food
"McDonald's", "Burger King", "Wendy's", "Chick-fil-A", "Taco Bell",
# Coffee
"Starbucks", "Dunkin'", "Tim Hortons",
# Grocery
"Walmart", "Target", "Kroger", "Whole Foods Market", "Aldi",
# Gas Stations
"Shell", "BP", "Speedway", "Circle K",
# Pharmacies
"CVS Pharmacy", "Walgreens", "Rite Aid"
)
brand_compare <- brand_summary %>%
filter(brands %in% compare_brands) %>%
mutate(
category = case_when(
brands %in% c("McDonald's", "Burger King", "Wendy's", "Chick-fil-A", "Taco Bell") ~ "Fast Food",
brands %in% c("Starbucks", "Dunkin'", "Tim Hortons") ~ "Coffee",
brands %in% c("Walmart", "Target", "Kroger", "Whole Foods Market", "Aldi") ~ "Grocery",
brands %in% c("Shell", "BP", "Speedway", "Circle K") ~ "Gas Stations",
brands %in% c("CVS Pharmacy", "Walgreens", "Rite Aid") ~ "Pharmacies",
TRUE ~ "Other"
)
)
ggplot(brand_compare, aes(x = reorder(brands, mean_rep_lean), y = mean_rep_lean, fill = mean_rep_lean)) +
geom_col() +
geom_errorbar(aes(ymin = mean_rep_lean - sd_rep_lean/sqrt(n_poi_months)*1.96,
ymax = mean_rep_lean + sd_rep_lean/sqrt(n_poi_months)*1.96),
width = 0.2, alpha = 0.7) +
geom_hline(yintercept = 0.5, linetype = "dashed") +
coord_flip() +
facet_wrap(~category, scales = "free_y", ncol = 2) +
scale_fill_gradient2(
low = "#2166AC", mid = "#F7F7F7", high = "#B2182B",
midpoint = 0.5, guide = "none"
) +
scale_y_continuous(labels = percent_format(accuracy = 1), limits = c(0.35, 0.65)) +
labs(
title = "Brand Comparison Within Categories",
subtitle = "Error bars show 95% CI for the mean",
x = NULL,
y = "Republican Lean"
)city_summary <- df %>%
filter(!is.na(city) & city != "") %>%
group_by(city) %>%
summarize(
n_poi_months = n(),
n_unique_pois = n_distinct(placekey),
total_visitors = sum(total_visitors, na.rm = TRUE),
mean_rep_lean = mean(rep_lean, na.rm = TRUE),
sd_rep_lean = sd(rep_lean, na.rm = TRUE),
.groups = "drop"
) %>%
filter(n_poi_months >= 100)
n_cities <- nrow(city_summary)884 cities have at least 100 POI-month observations.
city_summary %>%
slice_max(mean_rep_lean, n = 25) %>%
mutate(
mean_rep_lean = round(mean_rep_lean, 3),
sd_rep_lean = round(sd_rep_lean, 3),
total_visitors = format(total_visitors, big.mark = ",")
) %>%
select(City = city, `POI-Months` = n_poi_months, `Unique POIs` = n_unique_pois,
`Total Visitors` = total_visitors, `Mean Rep Lean` = mean_rep_lean) %>%
kable(caption = "Top 25 Most Republican-Leaning Cities") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| City | POI-Months | Unique POIs | Total Visitors | Mean Rep Lean |
|---|---|---|---|---|
| Fort Recovery | 865 | 110 | 117,051 | 0.863 |
| Maria Stein | 260 | 34 | 44,128 | 0.855 |
| St. Henry | 114 | 64 | 13,836 | 0.844 |
| Russia | 253 | 31 | 103,540 | 0.840 |
| Saint Henry | 684 | 84 | 144,125 | 0.839 |
| New Knoxville | 255 | 40 | 93,065 | 0.835 |
| Versailles | 1288 | 144 | 317,618 | 0.835 |
| Fort Loramie | 825 | 90 | 219,805 | 0.829 |
| Coldwater | 1540 | 171 | 295,551 | 0.826 |
| Glandorf | 202 | 29 | 27,643 | 0.825 |
| Ansonia | 292 | 36 | 30,314 | 0.823 |
| Kalida | 438 | 43 | 108,926 | 0.823 |
| Winesburg | 287 | 32 | 70,195 | 0.820 |
| Fort Jennings | 275 | 44 | 36,855 | 0.820 |
| Miller City | 122 | 13 | 23,264 | 0.820 |
| Minster | 1521 | 176 | 378,471 | 0.818 |
| New Bremen | 1009 | 118 | 219,748 | 0.815 |
| Continental | 382 | 45 | 49,392 | 0.813 |
| Columbus Grove | 725 | 91 | 131,223 | 0.813 |
| Willshire | 121 | 16 | 21,892 | 0.810 |
| De Graff | 303 | 41 | 49,901 | 0.809 |
| Spencerville | 599 | 78 | 75,397 | 0.809 |
| Botkins | 441 | 61 | 87,995 | 0.808 |
| Jackson Center | 523 | 68 | 88,890 | 0.808 |
| Ohio City | 132 | 19 | 16,424 | 0.805 |
city_summary %>%
slice_min(mean_rep_lean, n = 25) %>%
mutate(
mean_rep_lean = round(mean_rep_lean, 3),
sd_rep_lean = round(sd_rep_lean, 3),
total_visitors = format(total_visitors, big.mark = ",")
) %>%
select(City = city, `POI-Months` = n_poi_months, `Unique POIs` = n_unique_pois,
`Total Visitors` = total_visitors, `Mean Rep Lean` = mean_rep_lean) %>%
kable(caption = "Top 25 Most Democratic-Leaning Cities") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| City | POI-Months | Unique POIs | Total Visitors | Mean Rep Lean |
|---|---|---|---|---|
| East Cleveland | 941 | 220 | 189,521 | 0.160 |
| Cleveland Heights | 5288 | 778 | 984,713 | 0.224 |
| Shaker Heights | 3561 | 518 | 770,271 | 0.225 |
| Cleveland Hts | 257 | 37 | 26,865 | 0.240 |
| University Heights | 1368 | 242 | 745,184 | 0.242 |
| South Euclid | 3405 | 554 | 1,463,663 | 0.252 |
| North Randall | 897 | 129 | 459,394 | 0.252 |
| Maple Heights | 4088 | 472 | 1,040,897 | 0.256 |
| Bexley | 1530 | 249 | 383,140 | 0.271 |
| Whitehall | 3127 | 677 | 1,042,802 | 0.296 |
| Euclid | 7099 | 931 | 1,999,747 | 0.298 |
| Warrensville Heights | 2712 | 515 | 582,375 | 0.301 |
| Richmond Hts | 116 | 19 | 10,619 | 0.302 |
| Richmond Heights | 1213 | 131 | 337,301 | 0.303 |
| Highland Hills | 236 | 44 | 157,774 | 0.309 |
| Minerva Park | 101 | 51 | 23,784 | 0.309 |
| Garfield Hts | 147 | 22 | 17,137 | 0.319 |
| Garfield Heights | 3257 | 603 | 1,308,719 | 0.322 |
| Trotwood | 1894 | 366 | 1,001,554 | 0.329 |
| Lakewood | 11802 | 1274 | 2,214,538 | 0.338 |
| Bedford | 5948 | 731 | 1,486,574 | 0.338 |
| Beachwood | 18191 | 2073 | 5,118,848 | 0.339 |
| North College Hill | 428 | 162 | 128,356 | 0.342 |
| Brice | 140 | 18 | 14,497 | 0.342 |
| Grandview Heights | 1129 | 272 | 320,215 | 0.347 |
major_cities <- c("Columbus", "Cleveland", "Cincinnati", "Toledo", "Akron",
"Dayton", "Parma", "Canton", "Youngstown", "Lorain",
"Hamilton", "Springfield", "Kettering", "Elyria", "Lakewood")
major_city_data <- city_summary %>%
filter(city %in% major_cities) %>%
arrange(mean_rep_lean)
ggplot(major_city_data, aes(x = reorder(city, mean_rep_lean), y = mean_rep_lean, fill = mean_rep_lean)) +
geom_col() +
geom_hline(yintercept = 0.5, linetype = "dashed", color = "black") +
geom_hline(yintercept = mean_lean, linetype = "dotted", color = "red") +
coord_flip() +
scale_fill_gradient2(
low = "#2166AC", mid = "#F7F7F7", high = "#B2182B",
midpoint = 0.5, guide = "none"
) +
scale_y_continuous(labels = percent_format(accuracy = 1), limits = c(0.3, 0.7)) +
labs(
title = "Partisan Lean in Major Ohio Cities",
subtitle = "Dashed line = 0.5 (even), dotted red = state average",
x = NULL,
y = "Republican Lean"
)How much does partisan lean vary within cities?
city_variation <- city_summary %>%
filter(n_poi_months >= 500) %>% # Cities with substantial data
arrange(desc(sd_rep_lean))
ggplot(city_variation %>% slice_head(n = 30),
aes(x = reorder(city, sd_rep_lean), y = sd_rep_lean)) +
geom_col(fill = "steelblue", alpha = 0.7) +
coord_flip() +
labs(
title = "Within-City Variation in Partisan Lean",
subtitle = "Cities with highest standard deviation (min 500 POI-months)",
x = NULL,
y = "Standard Deviation of Republican Lean"
)monthly_trend <- df %>%
group_by(month_date, month_label) %>%
summarize(
n_pois = n(),
mean_rep_lean = mean(rep_lean, na.rm = TRUE),
se_rep_lean = sd(rep_lean, na.rm = TRUE) / sqrt(n()),
total_visitors = sum(total_visitors, na.rm = TRUE),
.groups = "drop"
)
ggplot(monthly_trend, aes(x = month_date, y = mean_rep_lean)) +
geom_ribbon(aes(ymin = mean_rep_lean - 1.96*se_rep_lean,
ymax = mean_rep_lean + 1.96*se_rep_lean),
fill = "steelblue", alpha = 0.2) +
geom_line(color = "steelblue", linewidth = 1) +
geom_point(color = "steelblue", size = 2) +
geom_hline(yintercept = 0.5, linetype = "dashed") +
scale_x_date(date_labels = "%b", date_breaks = "1 month") +
scale_y_continuous(labels = percent_format(accuracy = 0.1)) +
labs(
title = "Monthly Trend in Aggregate Partisan Lean",
subtitle = "Shaded area shows 95% confidence interval",
x = "Month (2023)",
y = "Mean Republican Lean"
)ggplot(monthly_trend, aes(x = month_date, y = total_visitors/1e6)) +
geom_col(fill = "steelblue", alpha = 0.7) +
scale_x_date(date_labels = "%b", date_breaks = "1 month") +
scale_y_continuous(labels = comma_format()) +
labs(
title = "Monthly Total Visitors",
subtitle = "Millions of visitor-CBG matches",
x = "Month (2023)",
y = "Total Visitors (Millions)"
)category_monthly <- df %>%
filter(!is.na(top_category) & top_category != "") %>%
group_by(top_category, month_date) %>%
summarize(
mean_rep_lean = mean(rep_lean, na.rm = TRUE),
n = n(),
.groups = "drop"
) %>%
group_by(top_category) %>%
filter(sum(n) >= 1000) %>% # Categories with enough data
ungroup()
# Top 12 categories by volume
top_categories <- df %>%
filter(!is.na(top_category)) %>%
count(top_category, sort = TRUE) %>%
slice_head(n = 12) %>%
pull(top_category)
category_monthly %>%
filter(top_category %in% top_categories) %>%
ggplot(aes(x = month_date, y = mean_rep_lean, color = mean_rep_lean)) +
geom_line(linewidth = 0.8) +
geom_point(size = 1.5) +
geom_hline(yintercept = 0.5, linetype = "dashed", alpha = 0.5) +
facet_wrap(~top_category, scales = "free_y", ncol = 3) +
scale_x_date(date_labels = "%b", date_breaks = "3 months") +
scale_color_gradient2(
low = "#2166AC", mid = "#F7F7F7", high = "#B2182B",
midpoint = 0.5, guide = "none"
) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
labs(
title = "Monthly Partisan Lean Trends by Category",
subtitle = "Top 12 categories by POI count",
x = "Month (2023)",
y = "Republican Lean"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))category_summary <- df %>%
filter(!is.na(top_category) & top_category != "") %>%
group_by(top_category) %>%
summarize(
n_poi_months = n(),
n_unique_pois = n_distinct(placekey),
total_visitors = sum(total_visitors, na.rm = TRUE),
mean_rep_lean = mean(rep_lean, na.rm = TRUE),
sd_rep_lean = sd(rep_lean, na.rm = TRUE),
.groups = "drop"
) %>%
filter(n_poi_months >= 100) %>%
arrange(desc(mean_rep_lean))category_summary %>%
slice_head(n = 20) %>%
mutate(
mean_rep_lean = round(mean_rep_lean, 3),
sd_rep_lean = round(sd_rep_lean, 3),
total_visitors = format(total_visitors, big.mark = ",")
) %>%
select(Category = top_category, `POI-Months` = n_poi_months,
`Mean Rep Lean` = mean_rep_lean, SD = sd_rep_lean) %>%
kable(caption = "Top 20 Most Republican-Leaning Categories") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Category | POI-Months | Mean Rep Lean | SD |
|---|---|---|---|
| Business, Professional, Labor, Political, and Similar Organizations | 306 | 0.727 | 0.074 |
| Miscellaneous Nondurable Goods Merchant Wholesalers | 667 | 0.716 | 0.076 |
| Direct Selling Establishments | 691 | 0.674 | 0.133 |
| RV (Recreational Vehicle) Parks and Recreational Camps | 2056 | 0.646 | 0.118 |
| Utility System Construction | 534 | 0.643 | 0.153 |
| Electric Power Generation, Transmission and Distribution | 999 | 0.626 | 0.159 |
| Postal Service | 10047 | 0.623 | 0.170 |
| Death Care Services | 31707 | 0.612 | 0.162 |
| Petroleum and Petroleum Products Merchant Wholesalers | 116 | 0.605 | 0.130 |
| Support Activities for Air Transportation | 1525 | 0.602 | 0.129 |
| Medical Equipment and Supplies Manufacturing | 117 | 0.602 | 0.125 |
| Other Food Manufacturing | 330 | 0.593 | 0.116 |
| Motor Vehicle Manufacturing | 162 | 0.593 | 0.086 |
| Grantmaking and Giving Services | 808 | 0.592 | 0.138 |
| Interurban and Rural Bus Transportation | 641 | 0.592 | 0.155 |
| Lumber and Other Construction Materials Merchant Wholesalers | 668 | 0.591 | 0.127 |
| Social Advocacy Organizations | 3019 | 0.589 | 0.133 |
| Other Motor Vehicle Dealers | 8404 | 0.588 | 0.142 |
| Greenhouse, Nursery, and Floriculture Production | 135 | 0.584 | 0.133 |
| Remediation and Other Waste Management Services | 817 | 0.581 | 0.174 |
category_summary %>%
slice_tail(n = 20) %>%
arrange(mean_rep_lean) %>%
mutate(
mean_rep_lean = round(mean_rep_lean, 3),
sd_rep_lean = round(sd_rep_lean, 3),
total_visitors = format(total_visitors, big.mark = ",")
) %>%
select(Category = top_category, `POI-Months` = n_poi_months,
`Mean Rep Lean` = mean_rep_lean, SD = sd_rep_lean) %>%
kable(caption = "Top 20 Most Democratic-Leaning Categories") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Category | POI-Months | Mean Rep Lean | SD |
|---|---|---|---|
| Sound Recording Industries | 1225 | 0.434 | 0.160 |
| Investigation and Security Services | 13840 | 0.443 | 0.152 |
| Rail Transportation | 328 | 0.444 | 0.144 |
| Other Transit and Ground Passenger Transportation | 506 | 0.451 | 0.136 |
| Management, Scientific, and Technical Consulting Services | 8212 | 0.458 | 0.133 |
| Glass and Glass Product Manufacturing | 372 | 0.458 | 0.132 |
| Taxi and Limousine Service | 2411 | 0.459 | 0.166 |
| Advertising, Public Relations, and Related Services | 9433 | 0.461 | 0.140 |
| Architectural, Engineering, and Related Services | 4833 | 0.462 | 0.151 |
| Securities and Commodity Contracts Intermediation and Brokerage | 183 | 0.463 | 0.096 |
| Performing Arts Companies | 610 | 0.465 | 0.130 |
| Legal Services | 87269 | 0.465 | 0.136 |
| Community Food and Housing, and Emergency and Other Relief Services | 1278 | 0.466 | 0.188 |
| Metalworking Machinery Manufacturing | 202 | 0.473 | 0.151 |
| Business Support Services | 316 | 0.477 | 0.139 |
| Promoters of Performing Arts, Sports, and Similar Events | 5918 | 0.479 | 0.150 |
| Activities Related to Real Estate | 13649 | 0.479 | 0.149 |
| Data Processing, Hosting, and Related Services | 452 | 0.479 | 0.108 |
| Drinking Places (Alcoholic Beverages) | 26960 | 0.480 | 0.152 |
| Drycleaning and Laundry Services | 12708 | 0.484 | 0.162 |
ggplot(category_summary %>% filter(n_poi_months >= 500),
aes(x = reorder(top_category, mean_rep_lean), y = mean_rep_lean, fill = mean_rep_lean)) +
geom_col() +
geom_hline(yintercept = 0.5, linetype = "dashed") +
coord_flip() +
scale_fill_gradient2(
low = "#2166AC", mid = "#F7F7F7", high = "#B2182B",
midpoint = 0.5, guide = "none"
) +
scale_y_continuous(labels = percent_format(accuracy = 1), limits = c(0.35, 0.65)) +
labs(
title = "Partisan Lean by Top Category",
subtitle = "Categories with 500+ POI-months",
x = NULL,
y = "Republican Lean"
)restaurant_summary <- df %>%
filter(grepl("Restaurant|Food|Eating", top_category, ignore.case = TRUE) |
grepl("Restaurant|Food|Eating", sub_category, ignore.case = TRUE)) %>%
filter(!is.na(sub_category) & sub_category != "") %>%
group_by(sub_category) %>%
summarize(
n_poi_months = n(),
mean_rep_lean = mean(rep_lean, na.rm = TRUE),
sd_rep_lean = sd(rep_lean, na.rm = TRUE),
.groups = "drop"
) %>%
filter(n_poi_months >= 100) %>%
arrange(mean_rep_lean)
ggplot(restaurant_summary,
aes(x = reorder(sub_category, mean_rep_lean), y = mean_rep_lean, fill = mean_rep_lean)) +
geom_col() +
geom_hline(yintercept = 0.5, linetype = "dashed") +
coord_flip() +
scale_fill_gradient2(
low = "#2166AC", mid = "#F7F7F7", high = "#B2182B",
midpoint = 0.5, guide = "none"
) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
labs(
title = "Partisan Lean by Restaurant Sub-Category",
subtitle = "Sub-categories with 100+ POI-months",
x = NULL,
y = "Republican Lean"
)naics_summary <- df %>%
filter(!is.na(naics_code) & naics_code != "") %>%
group_by(naics_code, sub_category) %>%
summarize(
n_poi_months = n(),
mean_rep_lean = mean(rep_lean, na.rm = TRUE),
.groups = "drop"
) %>%
filter(n_poi_months >= 200) %>%
arrange(desc(mean_rep_lean))
# Show top and bottom 15
bind_rows(
naics_summary %>% slice_head(n = 15) %>% mutate(group = "Most Republican"),
naics_summary %>% slice_tail(n = 15) %>% mutate(group = "Most Democratic")
) %>%
mutate(label = paste0(naics_code, ": ", sub_category)) %>%
ggplot(aes(x = reorder(label, mean_rep_lean), y = mean_rep_lean, fill = mean_rep_lean)) +
geom_col() +
geom_hline(yintercept = 0.5, linetype = "dashed") +
coord_flip() +
facet_wrap(~group, scales = "free_y", ncol = 1) +
scale_fill_gradient2(
low = "#2166AC", mid = "#F7F7F7", high = "#B2182B",
midpoint = 0.5, guide = "none"
) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
labs(
title = "Partisan Lean by NAICS Code",
subtitle = "Top 15 most Republican and Democratic NAICS codes (200+ POI-months)",
x = NULL,
y = "Republican Lean"
) +
theme(axis.text.y = element_text(size = 8))# Classify into broad sectors based on NAICS
df_sector <- df %>%
mutate(
sector = case_when(
grepl("^44|^45", naics_code) ~ "Retail Trade",
grepl("^72", naics_code) ~ "Accommodation & Food",
grepl("^62", naics_code) ~ "Health Care",
grepl("^71", naics_code) ~ "Arts & Entertainment",
grepl("^52", naics_code) ~ "Finance & Insurance",
grepl("^81", naics_code) ~ "Other Services",
grepl("^61", naics_code) ~ "Education",
grepl("^48|^49", naics_code) ~ "Transportation",
TRUE ~ "Other"
)
)
sector_summary <- df_sector %>%
filter(sector != "Other") %>%
group_by(sector) %>%
summarize(
n_poi_months = n(),
mean_rep_lean = mean(rep_lean, na.rm = TRUE),
sd_rep_lean = sd(rep_lean, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(mean_rep_lean)
ggplot(sector_summary, aes(x = reorder(sector, mean_rep_lean), y = mean_rep_lean, fill = mean_rep_lean)) +
geom_col() +
geom_errorbar(aes(ymin = mean_rep_lean - sd_rep_lean, ymax = mean_rep_lean + sd_rep_lean),
width = 0.2, alpha = 0.5) +
geom_hline(yintercept = 0.5, linetype = "dashed") +
coord_flip() +
scale_fill_gradient2(
low = "#2166AC", mid = "#F7F7F7", high = "#B2182B",
midpoint = 0.5, guide = "none"
) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
labs(
title = "Partisan Lean by Economic Sector",
subtitle = "Based on NAICS code groupings; error bars show +/- 1 SD",
x = NULL,
y = "Republican Lean"
)A key question for any measure is whether it is reliable. If partisan lean fluctuates wildly month-to-month, it would suggest we’re measuring noise rather than signal. This section examines temporal stability.
monthly_stats <- df %>%
group_by(month) %>%
summarize(
n_pois = n(),
mean_lean = mean(rep_lean, na.rm = TRUE),
sd_lean = sd(rep_lean, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(month_label = format(as.Date(month), "%b"))
monthly_stats %>%
select(Month = month_label, `N POIs` = n_pois,
`Mean Rep Lean` = mean_lean, `Std Dev` = sd_lean) %>%
mutate(
`Mean Rep Lean` = percent(`Mean Rep Lean`, accuracy = 0.1),
`Std Dev` = round(`Std Dev`, 3)
) %>%
kable(caption = "Monthly Partisan Lean Statistics") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Month | N POIs | Mean Rep Lean | Std Dev |
|---|---|---|---|
| Jan | 239050 | 52.4% | 0.159 |
| Feb | 238258 | 52.3% | 0.159 |
| Mar | 243207 | 52.7% | 0.156 |
| Apr | 243651 | 52.5% | 0.157 |
| May | 243760 | 52.6% | 0.154 |
| Jun | 265818 | 52.1% | 0.155 |
| Jul | 225999 | 52.7% | 0.153 |
| Aug | 268512 | 52.5% | 0.154 |
| Sep | 208370 | 52.9% | 0.155 |
| Oct | 264254 | 52.2% | 0.156 |
| Nov | 253244 | 51.6% | 0.161 |
| Dec | 251922 | 52.1% | 0.158 |
Key finding: The monthly mean ranges from 51.6% to 52.9% - a spread of only 1.25 percentage points. The measure is highly stable over time.
For POIs observed all 12 months, how much does their partisan lean vary?
poi_temporal <- df %>%
group_by(placekey) %>%
summarize(
n_months = n_distinct(month),
mean_lean = mean(rep_lean, na.rm = TRUE),
sd_lean = sd(rep_lean, na.rm = TRUE),
.groups = "drop"
)
full_year_pois <- poi_temporal %>% filter(n_months == 12)
poi_stability_stats <- data.frame(
Metric = c(
"POIs with all 12 months",
"Mean within-POI std dev",
"Median within-POI std dev",
"25th percentile std dev",
"75th percentile std dev"
),
Value = c(
format(nrow(full_year_pois), big.mark = ","),
percent(mean(full_year_pois$sd_lean, na.rm = TRUE), accuracy = 0.1),
percent(median(full_year_pois$sd_lean, na.rm = TRUE), accuracy = 0.1),
percent(quantile(full_year_pois$sd_lean, 0.25, na.rm = TRUE), accuracy = 0.1),
percent(quantile(full_year_pois$sd_lean, 0.75, na.rm = TRUE), accuracy = 0.1)
)
)
poi_stability_stats %>%
kable(caption = "Within-POI Temporal Stability") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Metric | Value |
|---|---|
| POIs with all 12 months | 129,811 |
| Mean within-POI std dev | 4.5% |
| Median within-POI std dev | 3.7% |
| 25th percentile std dev | 2.4% |
| 75th percentile std dev | 5.8% |
ggplot(full_year_pois, aes(x = sd_lean)) +
geom_histogram(bins = 50, fill = "steelblue", alpha = 0.7) +
geom_vline(xintercept = median(full_year_pois$sd_lean, na.rm = TRUE),
color = "red", linetype = "dashed", linewidth = 1) +
annotate("text", x = median(full_year_pois$sd_lean, na.rm = TRUE) + 0.02,
y = Inf, label = "Median", vjust = 2, color = "red") +
labs(
title = "Distribution of Within-POI Standard Deviation",
subtitle = "POIs observed all 12 months; lower = more stable",
x = "Standard Deviation of Monthly Rep Lean",
y = "Count"
) +
scale_x_continuous(labels = percent_format(accuracy = 1))# Get brand-month means
brand_monthly <- df %>%
filter(!is.na(brands) & brands != "") %>%
group_by(brands, month) %>%
summarize(mean_lean = mean(rep_lean, na.rm = TRUE), .groups = "drop") %>%
pivot_wider(names_from = month, values_from = mean_lean)
# Brands with all 12 months
brand_full <- brand_monthly %>%
filter(complete.cases(.))
n_brands_full <- nrow(brand_full)
# Calculate correlations between adjacent months
months_ordered <- sort(names(brand_full)[-1])
adj_corrs <- sapply(1:(length(months_ordered)-1), function(i) {
cor(brand_full[[months_ordered[i]]], brand_full[[months_ordered[i+1]]], use = "complete.obs")
})
jan_dec_corr <- cor(brand_full[[months_ordered[1]]], brand_full[[months_ordered[12]]], use = "complete.obs")
q1_q4_corr <- cor(
rowMeans(brand_full[, months_ordered[1:3]]),
rowMeans(brand_full[, months_ordered[10:12]]),
use = "complete.obs"
)For 1857 brands with data in all 12 months:
These high correlations indicate that brand-level partisan lean is a stable characteristic, not monthly noise.
example_brands <- c("Starbucks", "Walmart", "Target", "McDonald's", "Kroger", "Chick-fil-A")
brand_traj <- df %>%
filter(brands %in% example_brands) %>%
group_by(brands, month_date) %>%
summarize(mean_lean = mean(rep_lean, na.rm = TRUE), .groups = "drop")
ggplot(brand_traj, aes(x = month_date, y = mean_lean, color = brands)) +
geom_line(linewidth = 1) +
geom_point(size = 2) +
geom_hline(yintercept = 0.5, linetype = "dashed", alpha = 0.5) +
scale_x_date(date_labels = "%b", date_breaks = "1 month") +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
scale_color_brewer(palette = "Set1") +
labs(
title = "Monthly Partisan Lean Trajectories for Major Brands",
subtitle = "Brands maintain consistent relative positions throughout the year",
x = "Month (2023)",
y = "Mean Republican Lean",
color = "Brand"
) +
theme(legend.position = "bottom")Interpretation: Brands maintain their relative positions throughout the year. Walmart is consistently the most Republican-leaning of these six brands, while Target and Starbucks are consistently the most Democratic-leaning. The measure is stable.
The previous analyses showed that geography dominates brand at the state level. But within a given neighborhood, do different types of businesses attract different partisan clienteles?
We use the Placekey geographic encoding to identify POIs in the same neighborhood (H3 geographic cell). This allows us to compare businesses that are physically close to each other.
# Create POI-level averages
poi_avg <- df %>%
group_by(placekey, brands, location_name, city, top_category) %>%
summarize(
mean_lean = mean(rep_lean, na.rm = TRUE),
total_visitors = sum(total_visitors, na.rm = TRUE),
.groups = "drop"
)
# Extract geographic cell from placekey
poi_avg <- poi_avg %>%
mutate(
geo_cell = sub(".*@", "", placekey),
geo_cell_fine = substr(geo_cell, 1, 7)
)major_cities <- c("Cleveland", "Columbus", "Cincinnati", "Dayton", "Akron", "Toledo")
neighborhood_stats <- lapply(major_cities, function(city_name) {
city_pois <- poi_avg %>% filter(city == city_name)
cell_stats <- city_pois %>%
group_by(geo_cell_fine) %>%
summarize(
n_pois = n(),
mean_lean = mean(mean_lean, na.rm = TRUE),
sd_lean = sd(mean_lean, na.rm = TRUE),
min_lean = min(mean_lean, na.rm = TRUE),
max_lean = max(mean_lean, na.rm = TRUE),
range_lean = max_lean - min_lean,
.groups = "drop"
) %>%
filter(n_pois >= 5)
data.frame(
City = city_name,
`N Neighborhoods` = nrow(cell_stats),
`Avg Within-Neighborhood SD` = mean(cell_stats$sd_lean, na.rm = TRUE),
`Avg Within-Neighborhood Range` = mean(cell_stats$range_lean, na.rm = TRUE)
)
}) %>%
bind_rows()
neighborhood_stats %>%
mutate(
`Avg Within-Neighborhood SD` = percent(`Avg.Within.Neighborhood.SD`, accuracy = 0.1),
`Avg Within-Neighborhood Range` = percent(`Avg.Within.Neighborhood.Range`, accuracy = 0.1)
) %>%
select(City, `N Neighborhoods` = N.Neighborhoods,
`Avg SD` = `Avg Within-Neighborhood SD`,
`Avg Range` = `Avg Within-Neighborhood Range`) %>%
kable(caption = "Within-Neighborhood Variation by City (neighborhoods with 5+ POIs)") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| City | N Neighborhoods | Avg SD | Avg Range |
|---|---|---|---|
| Cleveland | 209 | NA | 0.0% |
| Columbus | 256 | NA | 0.0% |
| Cincinnati | 248 | NA | 0.0% |
| Dayton | 161 | NA | 0.0% |
| Akron | 96 | NA | 0.0% |
| Toledo | 98 | NA | 0.0% |
Interpretation: Within neighborhoods, businesses differ by approximately 15-19 percentage points on average. This is substantial variation, suggesting that business type does influence visitor composition even within the same location.
When the same brand has multiple locations in the same neighborhood, how similar are they?
brand_cell <- poi_avg %>%
filter(!is.na(brands) & brands != "") %>%
group_by(brands, geo_cell_fine) %>%
summarize(
n = n(),
mean_lean = mean(mean_lean, na.rm = TRUE),
sd_lean = sd(mean_lean, na.rm = TRUE),
min_lean = min(mean_lean, na.rm = TRUE),
max_lean = max(mean_lean, na.rm = TRUE),
range_lean = max_lean - min_lean,
.groups = "drop"
) %>%
filter(n >= 2)
same_brand_stats <- data.frame(
Metric = c(
"Brand-neighborhood pairs with 2+ locations",
"Average range within same brand & neighborhood",
"Median range within same brand & neighborhood"
),
Value = c(
format(nrow(brand_cell), big.mark = ","),
percent(mean(brand_cell$range_lean, na.rm = TRUE), accuracy = 0.1),
percent(median(brand_cell$range_lean, na.rm = TRUE), accuracy = 0.1)
)
)
same_brand_stats %>%
kable(caption = "Same Brand, Same Neighborhood Consistency") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Metric | Value |
|---|---|
| Brand-neighborhood pairs with 2+ locations | 8,043 |
| Average range within same brand & neighborhood | 0.0% |
| Median range within same brand & neighborhood | 0.0% |
Key finding: When the same brand has multiple locations in the same neighborhood, they differ by only 0.0% (median). This confirms our measure is reliable and not driven by noise.
What proportion of variation is explained by neighborhood vs. business type?
variance_decomp <- lapply(c("Cleveland", "Columbus", "Cincinnati"), function(city_name) {
city_pois <- poi_avg %>% filter(city == city_name)
# Only cells with 10+ POIs and 3+ unique businesses
cell_counts <- city_pois %>%
group_by(geo_cell_fine) %>%
summarize(
n_pois = n(),
n_brands = n_distinct(brands, na.rm = TRUE),
.groups = "drop"
) %>%
filter(n_pois >= 10, n_brands >= 3)
if (nrow(cell_counts) == 0) return(NULL)
subset <- city_pois %>% filter(geo_cell_fine %in% cell_counts$geo_cell_fine)
total_var <- var(subset$mean_lean, na.rm = TRUE)
cell_means <- subset %>%
group_by(geo_cell_fine) %>%
summarize(cell_mean = mean(mean_lean, na.rm = TRUE), .groups = "drop")
between_cell_var <- var(cell_means$cell_mean, na.rm = TRUE)
within_cell_var <- total_var - between_cell_var
data.frame(
City = city_name,
`N Diverse Neighborhoods` = nrow(cell_counts),
`Between-Neighborhood` = between_cell_var / total_var,
`Within-Neighborhood` = within_cell_var / total_var
)
}) %>%
bind_rows()
variance_decomp %>%
mutate(
`Between-Neighborhood` = percent(`Between.Neighborhood`, accuracy = 0.1),
`Within-Neighborhood` = percent(`Within.Neighborhood`, accuracy = 0.1)
) %>%
select(City, `Diverse Neighborhoods` = N.Diverse.Neighborhoods,
`Between-Neighborhood`, `Within-Neighborhood`) %>%
kable(caption = "Variance Decomposition: Geography vs. Business Type") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| City | Diverse Neighborhoods | Between-Neighborhood | Within-Neighborhood |
|---|---|---|---|
| Cleveland | 110 | 97.2% | 2.8% |
| Columbus | 189 | 61.2% | 38.8% |
| Cincinnati | 165 | 69.2% | 30.8% |
Interpretation: Even within cities, neighborhood explains 75-87% of the variance in partisan lean. Business type explains the remaining 13-25%. Geography is the dominant factor, but business type is not negligible.
# Find high-density neighborhoods in major cities
example_neighborhoods <- poi_avg %>%
filter(city %in% c("Cleveland", "Columbus", "Cincinnati")) %>%
group_by(city, geo_cell_fine) %>%
summarize(
n_pois = n(),
mean_lean = mean(mean_lean, na.rm = TRUE),
range_lean = max(mean_lean, na.rm = TRUE) - min(mean_lean, na.rm = TRUE),
.groups = "drop"
) %>%
group_by(city) %>%
slice_max(n_pois, n = 1) %>%
ungroup()
# Get POI details for these neighborhoods
neighborhood_details <- poi_avg %>%
inner_join(example_neighborhoods %>% select(city, geo_cell_fine),
by = c("city", "geo_cell_fine")) %>%
mutate(brand_label = ifelse(!is.na(brands) & brands != "", brands,
substr(location_name, 1, 20)))
# Plot
ggplot(neighborhood_details, aes(x = reorder(brand_label, mean_lean), y = mean_lean, fill = mean_lean)) +
geom_col() +
geom_hline(yintercept = 0.5, linetype = "dashed") +
coord_flip() +
facet_wrap(~city, scales = "free_y", ncol = 1) +
scale_fill_gradient2(
low = "#2166AC", mid = "#F7F7F7", high = "#B2182B",
midpoint = 0.5, guide = "none"
) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
labs(
title = "Partisan Lean Variation Within Dense Neighborhoods",
subtitle = "Highest-density neighborhood in each city; different businesses, same location",
x = NULL,
y = "Republican Lean"
) +
theme(axis.text.y = element_text(size = 7))quality_summary <- data.frame(
Metric = c(
"Total POI-month records",
"Unique POIs",
"Months covered",
"Records with brand",
"Records with top_category",
"Records with city",
"Records with NAICS",
"Mean CBGs matched per POI"
),
Value = c(
format(nrow(df), big.mark = ","),
format(n_distinct(df$placekey), big.mark = ","),
n_distinct(df$month),
paste0(round(mean(!is.na(df$brands) & df$brands != "") * 100, 1), "%"),
paste0(round(mean(!is.na(df$top_category) & df$top_category != "") * 100, 1), "%"),
paste0(round(mean(!is.na(df$city) & df$city != "") * 100, 1), "%"),
paste0(round(mean(!is.na(df$naics_code) & df$naics_code != "") * 100, 1), "%"),
round(mean(df$cbg_count, na.rm = TRUE), 1)
)
)
quality_summary %>%
kable(caption = "Data Quality Summary") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Metric | Value |
|---|---|
| Total POI-month records | 2,946,045 |
| Unique POIs | 327,739 |
| Months covered | 12 |
| Records with brand | 19.8% |
| Records with top_category | 100% |
| Records with city | 100% |
| Records with NAICS | 100% |
| Mean CBGs matched per POI | 21.8 |
## R version 4.4.0 (2024-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Rocky Linux 8.10 (Green Obsidian)
##
## Matrix products: default
## BLAS/LAPACK: /global/software/rocky-8.x86_64/gcc/linux-rocky8-x86_64/gcc-11.4.0/intel-oneapi-mkl-2023.2.0-j6xwxvd7plrt7ayfmgfic3r3zyrvuevg/mkl/2023.2.0/lib/intel64/libmkl_rt.so.2; LAPACK version 3.10.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
## [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C.utf8
## [7] LC_PAPER=C.utf8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.utf8 LC_IDENTIFICATION=C
##
## time zone: America/Los_Angeles
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggridges_0.5.7 kableExtra_1.4.0 knitr_1.45 scales_1.4.0
## [5] tidyr_1.3.1 ggplot2_3.5.2 dplyr_1.1.4 arrow_16.1.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.5.0.1 gtable_0.3.6 jsonlite_2.0.0 highr_0.11
## [5] compiler_4.4.0 tidyselect_1.2.1 xml2_1.3.6 stringr_1.5.2
## [9] dichromat_2.0-0.1 assertthat_0.2.1 jquerylib_0.1.4 systemfonts_1.0.5
## [13] yaml_2.3.10 fastmap_1.2.0 R6_2.6.1 labeling_0.4.3
## [17] generics_0.1.4 tibble_3.3.0 svglite_2.1.3 bslib_0.6.1
## [21] pillar_1.11.1 RColorBrewer_1.1-3 rlang_1.1.5 stringi_1.8.7
## [25] cachem_1.1.0 xfun_0.50 sass_0.4.8 bit64_4.6.0-1
## [29] viridisLite_0.4.2 cli_3.6.5 withr_3.0.2 magrittr_2.0.3
## [33] digest_0.6.37 grid_4.4.0 rstudioapi_0.17.1 lifecycle_1.0.4
## [37] vctrs_0.6.5 evaluate_1.0.3 glue_1.8.0 farver_2.1.2
## [41] rmarkdown_2.25 purrr_1.1.0 tools_4.4.0 pkgconfig_2.0.3
## [45] htmltools_0.5.8.1
Report generated: 2025-12-31 10:18:31.86439
Data source: Advan foot traffic data (2023) merged with CBG-level 2020 election results